R is a programming language designed to help you perform
statistical analysis, create graphics, and later on write your own
statistical software. R is becoming increasingly popular
and knowledge of R will help you on the job market. R is
probably the most versatile statistical tool out there (and it’s free
and open-source so you can literally use it anywhere). It is for example
used in all fields of academia, from biology to economics, and outside
academia including
RStudio is a great graphical user interface for R. In
recent years, a growing number of features have been added to this
graphical user interface, which makes it the preferred choice for
learning R, especially among beginners. You can think about
it as R being the engine of the car and RStudio being the
dashboard.
RStudio projects make it straightforward to divide your work into multiple contexts, each with its own working directory, workspace, history, and source documents. A project is basically a folder on your computer that holds all the files relevant to a particular piece of work. Working in RStudio Projects has multiple advantages:
R session
(process) is started. This makes sure that things you do in different
projects do not mess up.Git is a version control system that makes it easy to track changes
and work on code collaboratively. GitHub is a hosting service for
git. You can think of it as a public Dropbox for code but
on steroids. With version control, you will build your projects
step-by-step, be able to come back to any version of the project, and
accompany everything with human-readable messages.
As a student, you even get unlimited private repositories which you can use if you don’t feel like sharing your code with the rest of the world (yet). We will use private repositories to distribute code and assignments to you. And you will use it to keep track of your code and collaborate in teams.
With git, writing code for a project will look somewhat like this:
A Git repository is a space where you store and manage a project. It contains all of your project’s files and stores each file’s revision history. It’s common to refer to a repository as a repo.
We will you one repository for each lab and one repository for each homework assignment. You can directly import (“pull”) repositories via RStudio and save them on your computer. If you changed something in your project, you can easily upload (“push”) the new version to GitHub. GitHub will keep track of all changes you made over time within your project.
Our workflow will appear a bit tricky at the beginning but we are sure that you will handle it with ease very soon. We assume that by now you downloaded and installed R and Rstudio and have your personal GitHub account.
The course has its own page on GitHub, you can find it here: https://github.com/uni-mannheim-qm-2023. This is the place where you can find all relevant material for the lab sessions. It is also the place where you download and hand in your homework assignments.
So how does this work?
Go to https://github.com/uni-mannheim-qm-2023
and click on the repository for the current week (this week, this is
called week01_introduction). Now, click on the green
Clone or download button and select Use
HTTPS (this might already be selected by default, and if it is,
you’ll see the text Clone with HTTPS as in the image below). Click on
the clipboard icon to copy the repo URL.
File on the top bar and select
New Project....Version Control.Git.Repository URL window. Click on Browse to
select the folder on your computer where you want to store the
project.Create Project..Rmd file that is stored in the project (in
week 1, this is called QM2023_Week01.Rmd).The RStudio interface has four panes:
Enough preparation, let’s finally dive into R!
R can perform basic math operations. Here are some examples:
1 + 1
## [1] 2
Some more calculations:
2 - 3
## [1] -1
4 * 5
## [1] 20
2^2
## [1] 4
4 / 2
## [1] 2
2^(1 / 2)
## [1] 1.414214
If you place parentheses correctly, R incorporates the order of operations.
((2 + 2) * 2)^2
## [1] 64
This should give the same result as before.
(4 * 2)^2
## [1] 64
But this of course gives a different result:
(2 + 2 * 2)^2
## [1] 36
You can also use other math functions you know from your calculator:
this is \(\sqrt{2}\)
sqrt(2)
## [1] 1.414214
when you do not specify the base, R uses the natural log with base \(e\), i.e. \(\log_e(10)\)
log(10)
## [1] 2.302585
but R can also use a different (virtually any) base, e.g. \(\log_{10}(10)\)
log(10, base = 10)
## [1] 1
or with base = 5, i.e. \(\log_5(10)\)
log(10, 5)
## [1] 1.430677
Pro tip: Always close your parentheses!
It is hard to understand pure code, especially for someone who did not write it (and future-you will also have a hard time to understand it).
Pro tip: Add comments to your code, describing what you are doing and why you are doing it.
With comments:
# symbol,# will be commented
out.# this is a comment
1 + 1 # This line runs
## [1] 2
# 1 + 1 This line does not run
Good coding style is like using correct punctuation.
Youcanmanagewithoutitbutitsuremakesthingseasiertoread.. – Hadley Wickham
But I already do have a calculator. Why do I need R?
R is so much more! R is an object-oriented programming language.
<- as assignment
operatorExamples:
lucky_number <- 7
# Now we created an (numeric) object called "lucky_number"
lucky_number
## [1] 7
The class() command lets us check the type of an
object:
lucky_number <-
class(lucky_number)
Let’s see how this works live, this time with a character object:
firstname <- "" # This is a character object
firstname
## [1] ""
class(firstname)
## [1] "character"
lastname <- ""
lastname
## [1] ""
Your turn: Here is your very first exercise!
Pro tip: Copy the lines of code that worked for something similar. Then, adjust the code according to your problem. That’s how coding works most of the time!
Create three objects:
1. `my_lucky_number` should contain your lucky number.
2. `my_firstname` should contain your firstname.
3. `my_lastname` should contain your lastname.
After you created the objects, call them separately. Don’t forget to add comments to your code.
What kind of data can I store in R? Different types of objects that can contain different types and sets of data:
We will go through all of these object types below. On top of that we will also learn how to calculate the measures of central tendency and variability with vectors.
Let’s start with vectors. We want a vector of the numbers 1, 2, 3, 4 and 5. How do I assign this set of numbers to a vector?
The c() function
combines single values to a vector:
example_vec <- c(1, 2, 3, 4, 5)
example_vec
## [1] 1 2 3 4 5
This also works for characters/strings:
country_code <- c("DE", "FR", "NL", "US", "UK")
country_code
## [1] "DE" "FR" "NL" "US" "UK"
And it also works for a combination of numbers and characters:
example_vec2 <- c("Welcome", "to", "the", "lab", "in", "A", 5)
example_vec2
## [1] "Welcome" "to" "the" "lab" "in" "A" "5"
What if we start with numbers?
example_vec3 <- c(1, 2, 3, 4, 5, "R can count!")
example_vec3
## [1] "1" "2" "3" "4" "5"
## [6] "R can count!"
Note that if you have a character field in your vector, R will turn ALL values into character data! (You can see that by the quotes around the values)
Let’s check the type of data by using the class()
command on example_vec3.
example_vec3 <- c(1, 2, 3, 4, 5, "R can count!")
class(example_vec3)
## [1] "character"
You can use mathematical functions on each element in numeric vectors/matrices etc.
example_vec <- c(1, 2, 3, 4, 5)
sqrt(example_vec) # Take the square root of each element in example_vec
## [1] 1.000000 1.414214 1.732051 2.000000 2.236068
What about multiplication?
example_vec <- c(1, 2, 3, 4, 5)
example_vec * 10
## [1] 10 20 30 40 50
There are also some functions that you can use on the whole vector.
example_vec <- c(1, 2, 3, 4, 5)
sum(example_vec) # Question: What does sum() do?
## [1] 15
length(example_vec) # Question: What does length() do?
## [1] 5
Matrices in R are two-dimensional table objects. In R, matrices are always row by column. Like roller coaster, Roman Catholic or Ray Charles).
In a matrix, all data must be of the same type. If you mix numeric and character entries, the matrix will be all character just like in a vector.
How do I create a matrix in R?
example_mat1 <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2
)
example_mat1 # How did R fill the numbers in the matrix?
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
You could also change the options an let R fill the matrix by rows (instead of columns):
example_mat2 <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2,
byrow = T
)
example_mat2 # See the difference?
## [,1] [,2]
## [1,] 1 2
## [2,] 3 4
## [3,] 5 6
Or you could create a matrix from different vectors by using
column-bind on two or more vectors. It works similar to the
c() function but with vectors as input instead of
scalars.
Let’s first create two vectors of the same length:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
# And now column-bind - cbind() - the two vectors.
example_mat3 <- cbind(vec1, vec2)
example_mat3
## vec1 vec2
## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
Similarly, we can row-bind – rbind() – the two
vectors:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
example_mat4 <- rbind(vec1, vec2)
example_mat4
## [,1] [,2] [,3] [,4] [,5] [,6]
## vec1 1 2 3 4 5 6
## vec2 7 8 9 10 11 12
Data frames are two-dimensional table objects, just like matrices. Most data you will analyze in R will be in this form.
You can create data frames from vectors just like
cbind() using data.frame():
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
example_df1 <- data.frame(vec1, vec2)
example_df1
## vec1 vec2
## 1 1 7
## 2 2 8
## 3 3 9
## 4 4 10
## 5 5 11
## 6 6 12
However, data frames are always column-bound vectors. In a data frame, everything within a column has to be of the same data type. But you can mix data types between columns:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df2 <- data.frame(vec1, vec2, vec3)
example_df2
## vec1 vec2 vec3
## 1 1 7 First Row
## 2 2 8 Second Row
## 3 3 9 Third Row
## 4 4 10 Fourth Row
## 5 5 11 Fifth Row
## 6 6 12 Sixth Row
You can also name your columns/variables. Either when creating your data frame:
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df3 <- data.frame(
variable_1to6 = vec1,
variable_7to12 = vec2,
variable_rows = vec3
)
example_df3
## variable_1to6 variable_7to12 variable_rows
## 1 1 7 First Row
## 2 2 8 Second Row
## 3 3 9 Third Row
## 4 4 10 Fourth Row
## 5 5 11 Fifth Row
## 6 6 12 Sixth Row
Or by renaming an existing data frame.
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df3 <- data.frame(vec1, vec2, vec3)
# Rename the variables of an existing data frame
names(example_df3) <- c("variable.1", "variable.2", "variable.3")
example_df3
## variable.1 variable.2 variable.3
## 1 1 7 First Row
## 2 2 8 Second Row
## 3 3 9 Third Row
## 4 4 10 Fourth Row
## 5 5 11 Fifth Row
## 6 6 12 Sixth Row
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
vec3 <-
c(
"First Row",
"Second Row",
"Third Row",
"Fourth Row",
"Fifth Row",
"Sixth Row"
)
example_df3 <- data.frame(vec1, vec2, vec3)
names(example_df3) <- c("variable.1", "variable.2", "variable.3")
We can also add a new variable to an existing data frame. We simply create a data frame which consists of a data frame and a vector:
example_df4 <-
data.frame(example_df3,
variable_4 = c(90, 91, 92, 93, 94, 95))
example_df4
## variable.1 variable.2 variable.3 variable_4
## 1 1 7 First Row 90
## 2 2 8 Second Row 91
## 3 3 9 Third Row 92
## 4 4 10 Fourth Row 93
## 5 5 11 Fifth Row 94
## 6 6 12 Sixth Row 95
These are like matrices, except that they are typically three-dimensional. You’re not going to see many of these, but we’ll introduce them for completeness. Here is an illustration of what a three-dimensional array could look like:
You can think of 10 3 x 5 bingo cards, that all display spaces 1 through 15 for example, as an array. If I were to display that in R, I would use the array function to write:
bingo_array <- array(seq(1, 15, 1),
dim = c(3, 5, 10))
bingo_array
The general syntax for this function is
array(values you want to array, dim = (row, column, height)).
List objects can contain a series of the other objects we just learned about. A single list can contain a value, a vector, a matrix, AND a dataframe - or many of each!
How do I make a list?
Use the list()
function!
# create a vector
example_vec <- c(1, 2, 3, 4, 5, 6, 7, 8)
# create a matrix
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2)
# create an array
example_array <- array(seq(1, 15, 1), dim = c(3, 5, 10))
example_vec3 <- c(1, 2, 3, 4)
## Store all objects in a list
example_list <- list(example_vec, example_mat, example_array)
example_list
Sometimes we want to select single or multiple data entries from our
objects. We can do this by selecting elements via [].
Let’s first do it with a vector. Remember our country_code vector?
country_code <- c("DE", "FR", "NL", "US", "UK")
country_code
## [1] "DE" "FR" "NL" "US" "UK"
country_code <- c("DE", "FR", "NL", "US", "UK")
Let’s say we only want to select the “US”. We can achieve this by accessing the value via its position in the vector:
country_code[4]
## [1] "US"
Now we want to select all values but the “US”:
country_code[-4]
## [1] "DE" "FR" "NL" "UK"
You can pass multiple indexes as a vector:
country_code[c(1, 2, 3)]
## [1] "DE" "FR" "NL"
1:3 generates the vector c(1, 2, 3) a bit
quicker.
country_code[1:3]
## [1] "DE" "FR" "NL"
If we want the values “DE”, “FR”, and “US”, a sequence does not really help. But we can put a vector with a combination of a sequence and some other values in the square brackets:
country_code[c(1:2, 4)]
## [1] "DE" "FR" "US"
We can access values of a matrix similarly. However, we need to think of one additional dimension.
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2)
example_mat
## [,1] [,2]
## [1,] 1 4
## [2,] 2 5
## [3,] 3 6
Generally, we type object[row, column] to access
specific rows and columns. To see how this works, let’s have a look at
our example_mat:
Now we want to access the value 6. It’s in the third row and the second column.
example_mat[3, 2]
## [1] 6
We could also select an entire column (and use it like a vector).
example_mat[, 2]
## [1] 4 5 6
By accessing values with the [] square brackets, we
could also replace values. Let’s say we want to recode the entire first
column in example_mat3 to 99:
example_mat[, 1] <- 99
example_mat
## [,1] [,2]
## [1,] 99 4
## [2,] 99 5
## [3,] 99 6
example_mat <- matrix(c(1, 2, 3, 4, 5, 6),
nrow = 3,
ncol = 2)
example_mat[, 1] <- 99
# And we want to recode the first and the third value in the second column
# to 91 and 100
example_mat[c(1, 3), 2] <- c(91, 100)
example_mat
## [,1] [,2]
## [1,] 99 91
## [2,] 99 5
## [3,] 99 100
This is a good start to select and recode data in an object. However, it might be a bit exhausting (maybe even impossible) to always look up the exact position in the object.
Luckily, R let’s us also select elements based on conditions. Instead of the position we put a condition in the [] square brackets.
==!=<><=>=&|So how do conditions work? Let’s create a matrix to work with
vec1 <- c(1, 2, 3, 4, 5, 6)
vec2 <- c(7, 8, 9, 10, 11, 12)
# And now column-bind (cbind()) the two vectors.
example_mat <- cbind(vec1, vec2)
example_mat
## vec1 vec2
## [1,] 1 7
## [2,] 2 8
## [3,] 3 9
## [4,] 4 10
## [5,] 5 11
## [6,] 6 12
example_mat > 9 # This returns TRUE or FALSE for each value in the object.
## vec1 vec2
## [1,] FALSE FALSE
## [2,] FALSE FALSE
## [3,] FALSE FALSE
## [4,] FALSE TRUE
## [5,] FALSE TRUE
## [6,] FALSE TRUE
Now if we put this condition in square brackets we get the values for which the condition is true.
example_mat[example_mat > 9]
## [1] 10 11 12
Here comes the second round of exercises:
Create two vectors vec1 and vec2.
vec1 should contain 1, 56, 23, 89, -3 and 5 (in that
order).vec2 contains 24, 78, 32, 27, 8 and 1.Now select elements of vec1 that are greater than 5
or smaller than 0.
Next set vec1 to zero if vec2 is
greater than 30 and smaller or equal to 32.
Please solve all three steps in the next code chunk.
Working with data frames is similar to working with matrices and vectors.
Usually (and especially for this class) we want to work with existing
datasets. R knows and can load most of the standard formats of datasets,
like .csv, .xlsx (Excel), .dta
(Stata), .sav (SPSS) and many more.
So far we only used R’s base functions. In order to use some more sophisticated or special R functions, we need to load libraries or packages first. Think of these libraries as extra apps that you can load on your smartphones to extend its functionality.
Right now, we want to load the dataset. In order to use the standard but foreign datasets we need the foreign package.
First, we want to have a look at what the package can do.
packageDescription("foreign")
## Package: foreign
## Priority: recommended
## Version: 0.8-84
## Date: 2022-12-06
## Title: Read Data Stored by 'Minitab', 'S', 'SAS', 'SPSS', 'Stata',
## 'Systat', 'Weka', 'dBase', ...
## Depends: R (>= 4.0.0)
## Imports: methods, utils, stats
## Authors@R: c( person("R Core Team", email = "R-core@R-project.org",
## role = c("aut", "cph", "cre")), person("Roger", "Bivand", role
## = c("ctb", "cph")), person(c("Vincent", "J."), "Carey", role =
## c("ctb", "cph")), person("Saikat", "DebRoy", role = c("ctb",
## "cph")), person("Stephen", "Eglen", role = c("ctb", "cph")),
## person("Rajarshi", "Guha", role = c("ctb", "cph")),
## person("Swetlana", "Herbrandt", role = "ctb"),
## person("Nicholas", "Lewin-Koh", role = c("ctb", "cph")),
## person("Mark", "Myatt", role = c("ctb", "cph")),
## person("Michael", "Nelson", role = "ctb"), person("Ben",
## "Pfaff", role = "ctb"), person("Brian", "Quistorff", role =
## "ctb"), person("Frank", "Warmerdam", role = c("ctb", "cph")),
## person("Stephen", "Weigand", role = c("ctb", "cph")),
## person("Free Software Foundation, Inc.", role = "cph"))
## Contact: see 'MailingList'
## Copyright: see file COPYRIGHTS
## Description: Reading and writing data stored by some versions of 'Epi
## Info', 'Minitab', 'S', 'SAS', 'SPSS', 'Stata', 'Systat',
## 'Weka', and for reading and writing some 'dBase' files.
## ByteCompile: yes
## Biarch: yes
## License: GPL (>= 2)
## BugReports: https://bugs.r-project.org
## MailingList: R-help@r-project.org
## URL: https://svn.r-project.org/R-packages/trunk/foreign/
## NeedsCompilation: yes
## Packaged: 2022-12-06 07:44:50 UTC; ripley
## Author: R Core Team [aut, cph, cre], Roger Bivand [ctb, cph], Vincent
## J. Carey [ctb, cph], Saikat DebRoy [ctb, cph], Stephen Eglen
## [ctb, cph], Rajarshi Guha [ctb, cph], Swetlana Herbrandt [ctb],
## Nicholas Lewin-Koh [ctb, cph], Mark Myatt [ctb, cph], Michael
## Nelson [ctb], Ben Pfaff [ctb], Brian Quistorff [ctb], Frank
## Warmerdam [ctb, cph], Stephen Weigand [ctb, cph], Free Software
## Foundation, Inc. [cph]
## Maintainer: R Core Team <R-core@R-project.org>
## Repository: CRAN
## Date/Publication: 2022-12-06 09:00:40 UTC
## Built: R 4.2.3; x86_64-w64-mingw32; 2023-03-15 14:08:45 UTC; windows
## ExperimentalWindowsRuntime: ucrt
## Archs: x64
##
## -- File: C:/Program Files/R/R-4.2.3/library/foreign/Meta/package.rds
# Ok this seems to be useful. So let's load the package to use it.
library(foreign)
You will often come across datasets which are stored as Stata data
files. Those files have the extension .dta.
Right now, we want to load the data set called
weather_data_germany_2021.dta which is already stored the
raw_data folder in our directory:
weather_data <- read.dta("raw_data/weather_data_germany_2021.dta")
The data contains yearly temperature averages of German cities as well as their geographical location (longitude and latitude). It comes from the “Deutscher Wetterdienst” and you can find it here. Now that we have loaded the data, we can have a look at it.
With head()we can look at the first six rows of the data
set:
head(weather_data)
## city longitude latitude mean_temp
## 1 Wacken 9.387966 54.02460 9.48
## 2 Hasenkrug-Hardebek 9.855267 54.00377 9.35
## 3 Muskau, Bad 14.700810 51.56598 9.29
## 4 Geisingen 8.647358 47.92417 8.13
## 5 Frankfurt/Main 8.521294 50.02591 10.54
## 6 Großer Arber 13.133791 49.11289 3.61
But we can also look at the entire data set:
weather_data
If we only want to look at the variable names, we can use
names():
names(weather_data)
## [1] "city" "longitude" "latitude" "mean_temp"
Now we can use our selecting abilities on a data frame. As before we can select elements via their numeric position:
weather_data[1, 2] # first row, second column
## [1] 9.387966
weather_data[1:3, 1] # rows 1-3, first column
## [1] "Wacken" "Hasenkrug-Hardebek" "Muskau, Bad"
Additionally, as columns usually have names in data frames, we can use the column names to select values in two ways.
First, we can put the column name in square brackets instead of a column number:
weather_data[1, "city"]
## [1] "Wacken"
weather_data[, "mean_temp"]
We can also look at two variables at once:
weather_data[, c("city", "mean_temp")]
Second, we can also select an entire column by using the
$ operator with the column name:
data.frame_name$column_name. Just like this:
weather_data$mean_temp
## [1] 9.48 9.35 9.29 8.13 10.54 3.61 10.31 8.98 9.04 11.17 9.63 9.83
## [13] 7.52 9.27 8.49 8.98 9.46 9.54 8.41 10.01 8.95 9.88 8.89 9.43
## [25] 8.94 9.81 9.92 8.75 7.13 8.87 9.77 9.53 9.59 10.22 9.67 9.41
## [37] 10.16 10.29 6.65 7.47 9.44 10.13 8.23 8.51 9.69 10.45 8.37 6.50
## [49] 9.45 9.73 9.66 9.52 10.67 7.33 9.33 5.29 10.02 5.38 8.26 10.70
## [61] 9.17 8.75 7.00 9.12 9.79 7.21 8.53 8.82 9.03 7.41 9.77 9.54
## [73] 8.29 9.85 8.51 9.88 8.66 8.61 8.39 7.92 10.21 9.66 9.80 9.95
## [85] 10.15 10.23 8.61 9.43 10.24 9.95 10.40 9.42 8.28 7.52 9.23 8.26
## [97] 8.42 9.76 10.11 9.13 9.71 9.53 9.59 10.00 9.16 5.85 9.95 10.75
## [109] 6.63 10.58 9.27 9.70 9.56 10.49 5.75 9.31 9.07 9.76 8.56 9.71
## [121] 9.92 7.74 9.28 9.69 8.34 9.74 7.11 8.18 8.84 7.94 9.64 10.11
## [133] 10.78 9.43 7.91 8.91 10.98 9.33 7.47 9.31 10.35 8.95 8.93 7.54
## [145] 9.68 8.36 9.06 9.57 8.85 9.48 9.62 9.22 9.90 9.42 7.92 10.31
## [157] 6.77 7.28 9.63 9.43 8.65 9.43 8.69 10.52 8.49 10.15 9.69 8.77
## [169] 9.74 9.66 8.57 8.81 9.18 8.16 10.31 7.02 9.27 9.28 8.89 8.93
## [181] 9.13 7.56 9.30 9.04 8.80 9.92 8.24 7.94 9.81 8.68 8.66 10.37
## [193] 9.05 9.78 9.69 8.56 8.46 8.79 9.05 7.60 8.40 8.41 9.54 9.61
## [205] 8.75 8.85 9.24 8.12 9.36 9.43 7.77 9.38 9.34 9.73 9.17 6.18
## [217] 9.44 8.78 7.20 8.39 9.78 9.77 10.19 9.92 9.47 8.95 10.14 8.90
## [229] 9.74 7.79 8.69 9.30 8.30 9.48 7.76 8.62 10.52 9.65 9.23 8.77
## [241] 10.06 9.34 8.72 8.60 10.85 10.66 10.59 9.05 9.61 8.06 7.25 8.02
## [253] 3.78 10.64 9.04 8.98 9.68 9.81 8.45 9.53 9.92 8.45 9.18 9.77
## [265] 9.70 9.81 10.12 7.78 9.90 8.50 9.48 9.14 10.40 9.84 10.16 9.07
## [277] 10.85 8.63 -4.05 8.87 10.20 9.15 9.13 8.87 7.90 9.57 7.11 8.62
## [289] 10.00 9.64 8.95 9.58 9.81 8.58 9.98 8.82 9.59 9.09 10.02 9.33
## [301] 10.24 10.58 10.32 9.40 8.22 10.25 6.11 7.44 6.39 6.91 8.16 9.56
## [313] 8.97 9.35 9.49 9.38 8.36 10.33 7.57 9.57 9.48 8.15 10.16 8.52
## [325] 8.18 5.74 10.84 10.42 9.97 10.22 9.84 9.66 10.87 10.99 9.69 8.56
## [337] 8.29 8.23 9.42 7.15 8.72 9.16 8.69 6.83 9.68 8.77 9.22 8.94
## [349] 8.16 9.10 10.01 9.39 9.17 9.53 10.55 11.37 9.92 8.26 9.50 8.62
## [361] 8.63 6.49 9.89 7.87 8.62 8.90 9.56 9.66 8.68 8.90 10.06 9.50
## [373] 9.83 9.52 6.90 9.35 5.09 9.90 9.83 8.91 10.22 8.67 9.80 10.12
## [385] 8.65 9.54 8.98 8.75 9.30 3.81 8.71 9.13 8.26 10.88 8.47 9.13
## [397] 9.61 8.49 9.36 9.71 8.45 10.23 6.90 9.29 7.41 9.57 8.02 10.48
## [409] 9.18 8.68 9.88 9.50 9.87 9.93 7.37 7.96 9.16 9.66 9.25 8.57
## [421] 9.98 9.34 7.59 8.37 9.23 9.84 7.94 9.44 8.87 9.42 10.13 8.70
## [433] 9.44 9.40 8.26 8.81 8.91 9.76 10.07 8.02 10.76 8.91 10.61 9.32
## [445] 9.19 9.21 9.31 8.62 9.04 9.35 8.75 9.96 9.75 10.06 9.71 7.16
## [457] 9.79 8.78 9.90 10.52 8.14 9.68 10.03 8.50 10.33 7.80 8.30 10.05
## [469] 8.53 9.47 8.70 7.76
Columns from data frames are essentially vectors. We can use all the operations and functions we can use for vectors (depending on their class.)
weather_data$mean_temp[1] # For example, we can select an element of the vector
## [1] 9.48
What if we want to add a new variable? Let’s create a variable named “cold”.
weather_data$cold <- 0
# What does this do?
weather_data$cold
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [38] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [75] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [112] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [149] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [186] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [223] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [260] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [297] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [334] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [371] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [408] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## [445] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Now, we want to recode “cold” to 1 for cities whose mean temperature is lower than 8 degrees Celsius.
weather_data$cold <- 0
weather_data$cold[weather_data$mean_temp < 8] <- 1
# Let's have a look at both variables:
weather_data[, c("city", "mean_temp", "cold")]
## city mean_temp cold
## 1 Wacken 9.48 0
## 2 Hasenkrug-Hardebek 9.35 0
## 3 Muskau, Bad 9.29 0
## 4 Geisingen 8.13 0
## 5 Frankfurt/Main 10.54 0
## 6 Großer Arber 3.61 1
## 7 Öhringen 10.31 0
## 8 Schotten 8.98 0
## 9 Rothenburg ob der Tauber 9.04 0
## 10 Waghäusel-Kirrlach 11.17 0
## 11 Boltenhagen 9.63 0
## 12 Würzburg 9.83 0
## 13 Altenstadt 7.52 1
## 14 Grambow-Schwennenz 9.27 0
## 15 Neuhütten/Spessart 8.49 0
## 16 Falkenberg,Kr.Rottal-Inn 8.98 0
## 17 Lübeck-Blankensee 9.46 0
## 18 Holzdorf (Flugplatz) 9.54 0
## 19 Königshofen, Bad 8.41 0
## 20 Wusterwitz 10.01 0
## 21 Reimlingen 8.95 0
## 22 Diepholz 9.88 0
## 23 Gera-Leumnitz 8.89 0
## 24 Seesen 9.43 0
## 25 Mühlhausen/Thüringen-Görmar 8.94 0
## 26 Hameln-Hastenbeck 9.81 0
## 27 Kaiserslautern 9.92 0
## 28 Lennestadt-Theten 8.75 0
## 29 Marienberg 7.13 1
## 30 Lügde-Paenbruch 8.87 0
## 31 Wittenberg 9.77 0
## 32 Dresden-Klotzsche 9.53 0
## 33 Buchenbach 9.59 0
## 34 Hamburg-Neuwiedenthal 10.22 0
## 35 Mergentheim, Bad-Neunkirchen 9.67 0
## 36 Cölbe, Kr. Marburg-Biedenkopf 9.41 0
## 37 Ennigerloh-Ostenfelde 10.16 0
## 38 Sachsenheim 10.29 0
## 39 Hoherodskopf/Vogelsberg 6.65 1
## 40 Tirschenreuth-Lodermühl 7.47 1
## 41 Notzingen 9.44 0
## 42 Rostock-Warnemünde 10.13 0
## 43 Siegsdorf-Höll 8.23 0
## 44 Lautertal-Oberlauter 8.51 0
## 45 Hohwacht 9.69 0
## 46 Berlin-Tempelhof 10.45 0
## 47 Holzkirchen 8.37 0
## 48 Mittenwald-Buckelwiesen 6.50 1
## 49 Dörnick 9.45 0
## 50 Lauchstädt, Bad 9.73 0
## 51 Lüchow 9.66 0
## 52 Wendisch Evern 9.52 0
## 53 Geldern-Walbeck 10.67 0
## 54 Schneifelforsthaus 7.33 1
## 55 Osterfeld 9.33 0
## 56 Carlsfeld 5.29 1
## 57 Aachen-Orsbach 10.02 0
## 58 Zinnwald-Georgenfeld 5.38 1
## 59 Kaisersbach-Cronhütte 8.26 0
## 60 Geisenheim 10.70 0
## 61 Weingarten, Kr. Ravensburg 9.17 0
## 62 Twistetal-Mühlhausen 8.75 0
## 63 Schönwald/Ofr.-Brunn 7.00 1
## 64 Wutöschingen-Ofteringen 9.12 0
## 65 Greifswalder Oie 9.79 0
## 66 Reit im Winkl 7.21 1
## 67 Amerang-Pfaffing 8.53 0
## 68 Feldberg/Mecklenburg 8.82 0
## 69 Landshut-Reithof 9.03 0
## 70 Garmisch-Partenkirchen 7.41 1
## 71 Sankt Peter-Ording 9.77 0
## 72 Langenlipsdorf 9.54 0
## 73 Kronach 8.29 0
## 74 Bremen 9.85 0
## 75 Kiefersfelden-Gach 8.51 0
## 76 Friesoythe-Altenoythe 9.88 0
## 77 Bertsdorf-Hörnitz 8.66 0
## 78 Leinefelde 8.61 0
## 79 Langenwetzendorf-Göttendorf 8.39 0
## 80 Marienberg, Bad 7.92 1
## 81 Lippstadt-Bökenförde 10.21 0
## 82 Gardelegen 9.66 0
## 83 Bielefeld-Deppendorf 9.80 0
## 84 Demker 9.95 0
## 85 Dresden-Hosterwitz 10.15 0
## 86 Emmendingen-Mundingen 10.23 0
## 87 Burgwald-Bottendorf 8.61 0
## 88 Mühlacker 9.43 0
## 89 Nauheim, Bad 10.24 0
## 90 Leipzig-Holzhausen 9.95 0
## 91 Stuttgart (Schnarrenberg) 10.40 0
## 92 Anklam 9.42 0
## 93 Weiden 8.28 0
## 94 Leutkirch-Herlazhofen 7.52 1
## 95 Tribsees 9.23 0
## 96 Feuchtwangen-Heilbronn 8.26 0
## 97 Ebrach 8.42 0
## 98 Kiel-Holtenau 9.76 0
## 99 Berlin Brandenburg 10.11 0
## 100 Wittstock-Rote Mühle 9.13 0
## 101 München-Stadt 9.71 0
## 102 Bremervörde 9.53 0
## 103 Artern 9.59 0
## 104 Nienburg 10.00 0
## 105 Starkenberg-Tegkwitz 9.16 0
## 106 Kahler Asten 5.85 1
## 107 Trier-Petrisberg 9.95 0
## 108 Tönisvorst 10.75 0
## 109 Wernigerode-Schierke 6.63 1
## 110 Saarbrücken-Burbach 10.58 0
## 111 Nossen 9.27 0
## 112 Konstanz 9.70 0
## 113 Gründau-Breitenborn 9.56 0
## 114 Rheinfelden 10.49 0
## 115 Wasserkuppe 5.75 1
## 116 Sigmarszell-Zeisertsweiler 9.31 0
## 117 Großerlach-Mannenweiler 9.07 0
## 118 Kirchdorf/Poel 9.76 0
## 119 Augsburg 8.56 0
## 120 Leipzig/Halle 9.71 0
## 121 Berlin-Dahlem (FU) 9.92 0
## 122 Grainet-Rehberg 7.74 1
## 123 Wittenborn 9.28 0
## 124 Lübben-Blumenfelde 9.69 0
## 125 Lechfeld 8.34 0
## 126 Alfhausen 9.74 0
## 127 Elster, Bad-Sohl 7.11 1
## 128 Ostheim v.d. Rhön 8.18 0
## 129 Herzberg 8.84 0
## 130 Meiningen 7.94 1
## 131 Wittmundhafen 9.64 0
## 132 Alzey 10.11 0
## 133 Düsseldorf 10.78 0
## 134 Nideggen-Schmidt 9.43 0
## 135 Harzgerode 7.91 1
## 136 Geringswalde-Altgeringswalde 8.91 0
## 137 Duisburg-Baerl 10.98 0
## 138 Garsebach bei Meißen 9.33 0
## 139 Freudenstadt 7.47 1
## 140 Eschwege 9.31 0
## 141 Obersulm-Willsbach 10.35 0
## 142 Schlüchtern-Herolz 8.95 0
## 143 Roth 8.93 0
## 144 Villingen-Schwenningen 7.54 1
## 145 Boizenburg 9.68 0
## 146 Wielenbach (Demollstr.) 8.36 0
## 147 Aldersbach-Kriestorf 9.06 0
## 148 Waren (Müritz) 9.57 0
## 149 Gottfrieding 8.85 0
## 150 Möhrendorf-Kleinseebach 9.48 0
## 151 List auf Sylt 9.62 0
## 152 Grünow 9.22 0
## 153 Kitzingen 9.90 0
## 154 Coschen 9.42 0
## 155 Neustadt am Kulm-Filchendorf 7.92 1
## 156 Cuxhaven 10.31 0
## 157 Birx/Rhön 6.77 1
## 158 Oberharz am Brocken-Stiege 7.28 1
## 159 Rotenburg (Wümme) 9.63 0
## 160 Rosengarten-Klecken 9.43 0
## 161 Moorgrund Gräfen-Nitzendorf 8.65 0
## 162 Wittingen-Vorhop 9.43 0
## 163 Idar-Oberstein 8.69 0
## 164 Köln-Bonn 10.52 0
## 165 Hahn 8.49 0
## 166 Wuppertal-Buchenhofen 10.15 0
## 167 Wernigerode 9.69 0
## 168 Buchen, Kr. Neckar-Odenwald 8.77 0
## 169 Hamburg-Fuhlsbüttel 9.74 0
## 170 Hoyerswerda 9.66 0
## 171 Laupheim 8.57 0
## 172 Löhnberg-Obershausen 8.81 0
## 173 Reichshof-Eckenhagen 9.18 0
## 174 Rottweil 8.16 0
## 175 Offenbach-Wetterpark 10.31 0
## 176 Teuschnitz 7.02 1
## 177 Singen 9.27 0
## 178 Putbus 9.28 0
## 179 Elsendorf-Horneck 8.89 0
## 180 Chieming 8.93 0
## 181 Hohn 9.13 0
## 182 Hohenpeißenberg 7.56 1
## 183 Ostenfeld (Rendsburg) 9.30 0
## 184 Menz 9.04 0
## 185 Metten 8.80 0
## 186 Belm 9.92 0
## 187 Pommelsbrunn-Mittelburg 8.24 0
## 188 Schleiz 7.94 1
## 189 Emden 9.81 0
## 190 Blankenrath 8.68 0
## 191 Lichtenhain-Mittelndorf 8.66 0
## 192 Weilerswist-Lommersum 10.37 0
## 193 Ruppertsecken 9.05 0
## 194 Aschersleben-Mehringen 9.78 0
## 195 Kubschütz, Kr. Bautzen 9.69 0
## 196 Meinerzhagen-Redlendorf 8.56 0
## 197 Plauen 8.46 0
## 198 Hechingen 8.79 0
## 199 Fritzlar (Flugplatz) 9.05 0
## 200 Pfullendorf 7.60 1
## 201 Tann/Rhön 8.40 0
## 202 Sandberg 8.41 0
## 203 Wolfach 9.54 0
## 204 Greifswald 9.61 0
## 205 Fürstenzell 8.75 0
## 206 Amberg-Unterammersricht 8.85 0
## 207 Karlshagen 9.24 0
## 208 Oberhaching-Laufzorn 8.12 0
## 209 Quickborn 9.36 0
## 210 Kyritz 9.43 0
## 211 Kohlgrub, Bad (Rosshof) 7.77 1
## 212 Heckelberg 9.38 0
## 213 Nürnberg 9.34 0
## 214 Freiburg/Elbe 9.73 0
## 215 Görlitz 9.17 0
## 216 Kleiner Inselsberg 6.18 1
## 217 Steinau, Kr. Cuxhaven 9.44 0
## 218 Eisenach 8.78 0
## 219 Berleburg, Bad-Stünzel 7.20 1
## 220 Manderscheid-Sonnenhof 8.39 0
## 221 Lippspringe, Bad 9.78 0
## 222 Drewitz bei Burg 9.77 0
## 223 Lüdinghausen-Brochtrup 10.19 0
## 224 Huy-Pabstorf 9.92 0
## 225 Elzach-Fisnacht 9.47 0
## 226 Oberzent-Beerfelden 8.95 0
## 227 Metzingen 10.14 0
## 228 Michelstadt-Vielbrunn 8.90 0
## 229 Klitzschen bei Torgau 9.74 0
## 230 Stötten 7.79 1
## 231 Schauenburg-Elgershausen 8.69 0
## 232 Gollhofen 9.30 0
## 233 Hermaringen-Allewind 8.30 0
## 234 Teterow 9.48 0
## 235 Prackenbach-Neuhäusl 7.76 1
## 236 Ebersberg-Halbing 8.62 0
## 237 Kahl/Main 10.52 0
## 238 Berus 9.65 0
## 239 Wiesenburg 9.23 0
## 240 Mühldorf 8.77 0
## 241 Braunschweig 10.06 0
## 242 Padenstedt (Pony-Park) 9.34 0
## 243 Moringen-Lutterbeck 8.72 0
## 244 Schwarzburg 8.60 0
## 245 Dürkheim, Bad 10.85 0
## 246 Heinsberg-Schleiden 10.66 0
## 247 Trier-Zewen 10.59 0
## 248 Querfurt-Mühle Lodersleben 9.05 0
## 249 Weiskirchen/Saar 9.61 0
## 250 Sohland/Spree 8.06 0
## 251 Klippeneck 7.25 1
## 252 Gilserberg-Moischeid 8.02 0
## 253 Brocken 3.78 1
## 254 Rheinau-Memprechtshofen 10.64 0
## 255 Waldems-Reinborn 9.04 0
## 256 Schwandorf 8.98 0
## 257 Friedrichshafen-Unterraderach 9.68 0
## 258 Hiddensee-Vitte 9.81 0
## 259 Attenkam 8.45 0
## 260 Erfde 9.53 0
## 261 Berlin-Buch 9.92 0
## 262 Gräfenberg-Kasberg 8.45 0
## 263 Wagersrott 9.18 0
## 264 Lindenberg 9.77 0
## 265 Saarbrücken-Ensheim 9.70 0
## 266 Großenkneten 9.81 0
## 267 Dresden-Strehlen 10.12 0
## 268 Oberviechtach 7.78 1
## 269 Hannover 9.90 0
## 270 Martinroda 8.50 0
## 271 Uelzen 9.48 0
## 272 Krölpa-Rockendorf 9.14 0
## 273 Borken in Westfalen 10.40 0
## 274 Neunkirchen-Seelscheid-Krawinkel 9.84 0
## 275 Ahaus 10.16 0
## 276 Naumburg/Saale-Kreipitzsch 9.07 0
## 277 Freiburg 10.85 0
## 278 Dillenburg 8.63 0
## 279 Zugspitze -4.05 1
## 280 Niederstetten 8.87 0
## 281 Schaafheim-Schlierbach 10.20 0
## 282 Rosenheim 9.15 0
## 283 Barth 9.13 0
## 284 Weißenburg-Emetzheim 8.87 0
## 285 Ulm-Mähringen 7.90 1
## 286 Ueckermünde 9.57 0
## 287 Wunsiedel-Schönbrunn 7.11 1
## 288 München-Flughafen 8.62 0
## 289 Leuchtturm Kiel 10.00 0
## 290 Doberlug-Kirchhain 9.64 0
## 291 Kissingen, Bad 8.95 0
## 292 Neunkirchen-Wellesweiler 9.58 0
## 293 Berge 9.81 0
## 294 Neuburg/Kammel-Langenhaslach 8.58 0
## 295 Celle 9.98 0
## 296 Erfurt-Weimar 8.82 0
## 297 Nordholz (Flugplatz) 9.59 0
## 298 Weimar-Schöndorf 9.09 0
## 299 Borkum-Flugplatz 10.02 0
## 300 Hümmerich 9.33 0
## 301 Kleve 10.24 0
## 302 Worms 10.58 0
## 303 Werl 10.32 0
## 304 Ummendorf 9.40 0
## 305 Dippoldiswalde-Reinberg 8.22 0
## 306 Müllheim 10.25 0
## 307 Deutschneudorf-Brüderwiese 6.11 1
## 308 Merklingen 7.44 1
## 309 Kleiner Feldberg/Taunus 6.39 1
## 310 Lenzkirch-Ruhbühl 6.91 1
## 311 Altheim, Kreis Biberach 8.16 0
## 312 Runkel-Ennerich 9.56 0
## 313 Straubing 8.97 0
## 314 Schwäbisch Gmünd-Weiler 9.35 0
## 315 Marnitz 9.49 0
## 316 Schonungen-Mainberg 9.38 0
## 317 Heinersreuth-Vollhof 8.36 0
## 318 Berlin-Marzahn 10.33 0
## 319 Hof 7.57 1
## 320 Schönhagen (Ostseebad) 9.57 0
## 321 Müncheberg 9.48 0
## 322 Memmingen 8.15 0
## 323 Barsinghausen-Hohenbostel 10.16 0
## 324 Balingen-Bronnhaupten 8.52 0
## 325 Parsberg/Oberpfalz-Eglwang 8.18 0
## 326 Neuhaus am Rennweg 5.74 1
## 327 Bergzabern, Bad 10.84 0
## 328 Essen-Bredeney 10.42 0
## 329 Rahden-Kleinendorf 9.97 0
## 330 Magdeburg 10.22 0
## 331 Gevelsberg-Oberbröking 9.84 0
## 332 Oschatz 9.66 0
## 333 Ohlsbach 10.87 0
## 334 Frankfurt/Main-Westend 10.99 0
## 335 Pforzheim-Ispringen 9.69 0
## 336 Sontra 8.56 0
## 337 Saldenburg-Entschenreuth 8.29 0
## 338 Neubulach-Oberhaugstett 8.23 0
## 339 Goldberg 9.42 0
## 340 Zwiesel 7.15 1
## 341 Altomünster-Maisbrunn 8.72 0
## 342 Steinhagen-Negast 9.16 0
## 343 Piding 8.69 0
## 344 Fichtelberg/Oberfranken-Hüttstadl 6.83 1
## 345 Pelzerhaken 9.68 0
## 346 Fulda-Horas 8.77 0
## 347 Schipkau-Klettwitz 9.22 0
## 348 Kösching 8.94 0
## 349 Nürnberg-Netzstall 8.16 0
## 350 Faßberg 9.10 0
## 351 Jena (Sternwarte) 10.01 0
## 352 Soltau 9.39 0
## 353 Olbersleben 9.17 0
## 354 Helmstedt-Emmerstedt 9.53 0
## 355 Mainz-Lerchenberg (ZDF) 10.55 0
## 356 Köln-Stammheim 11.37 0
## 357 Bernburg/Saale (Nord) 9.92 0
## 358 Brilon-Thülen 8.26 0
## 359 Zeitz 9.50 0
## 360 Ellwangen-Rindelbach 8.62 0
## 361 Alsfeld-Eifa 8.63 0
## 362 Meßstetten-Appental 6.49 1
## 363 Michelstadt 9.89 0
## 364 Sigmaringen-Laiz 7.87 1
## 365 Neu-Ulrichstein 8.62 0
## 366 Trostberg 8.90 0
## 367 Freudenberg/Main-Boxtal 9.56 0
## 368 Schwerin 9.66 0
## 369 Donauwörth-Osterweiler 8.68 0
## 370 Dillingen/Donau-Fristingen 8.90 0
## 371 Itzehoe 10.06 0
## 372 Angermünde 9.50 0
## 373 Potsdam 9.83 0
## 374 Röllbach 9.52 0
## 375 Braunlage 6.90 1
## 376 Hattstedt 9.35 0
## 377 Schmücke 5.09 1
## 378 Wangerland-Hooksiel 9.90 0
## 379 Harzburg, Bad 9.83 0
## 380 Deuselbach 8.91 0
## 381 Münster/Osnabrück 10.22 0
## 382 Simmern-Wahlbach 8.67 0
## 383 Salzuflen, Bad 9.80 0
## 384 Quedlinburg 10.12 0
## 385 Chemnitz 8.65 0
## 386 Tholey 9.54 0
## 387 Lichtentanne 8.98 0
## 388 Kirchberg/Jagst-Herboldshausen 8.75 0
## 389 Gießen/Wettenberg 9.30 0
## 390 Fichtelberg 3.81 1
## 391 Renningen-Ihinger Hof 8.71 0
## 392 Wesertal-Lippoldsberg 9.13 0
## 393 Schorndorf-Knöbling 8.26 0
## 394 Mannheim 10.88 0
## 395 Weidenbach-Weiherschneidbach 8.47 0
## 396 Arnstein-Müdesheim 9.13 0
## 397 Elpersbüttel 9.61 0
## 398 Gelbelsee 8.49 0
## 399 Schleswig 9.36 0
## 400 Lenzen/Elbe 9.71 0
## 401 Kümmersbruck 8.45 0
## 402 Bremerhaven 10.23 0
## 403 Oberstdorf 6.90 1
## 404 Göttingen 9.29 0
## 405 Lobenstein, Bad 7.41 1
## 406 Worpswede-Hüttenbusch 9.57 0
## 407 Veilsdorf 8.02 0
## 408 Königswinter-Heiderhof 10.48 0
## 409 Montabaur 9.18 0
## 410 Lüdenscheid 8.68 0
## 411 Genthin 9.88 0
## 412 Arnsberg-Neheim 9.50 0
## 413 Jeßnitz 9.87 0
## 414 Darmstadt 9.93 0
## 415 Münsingen-Apfelstetten 7.37 1
## 416 Neukirchen-Hauptschwenda 7.96 1
## 417 Regensburg 9.16 0
## 418 Stuttgart-Echterdingen 9.66 0
## 419 Arkona 9.25 0
## 420 Weihenstephan-Dürnast 8.57 0
## 421 Waibstadt 9.98 0
## 422 Trollenhagen 9.34 0
## 423 Waldmünchen 7.59 1
## 424 Eichstätt-Landershofen 8.37 0
## 425 Leck 9.23 0
## 426 Cottbus 9.84 0
## 427 Kempten 7.94 1
## 428 Zehdenick 9.44 0
## 429 Neuburg/Donau (Flugplatz) 8.87 0
## 430 Dachwig 9.42 0
## 431 Norderney 10.13 0
## 432 Waltershausen 8.70 0
## 433 Alfeld 9.44 0
## 434 Hilgenroth 9.40 0
## 435 Kall-Sistig 8.26 0
## 436 Weinbiet 8.81 0
## 437 Warburg 8.91 0
## 438 Seehausen 9.76 0
## 439 Baden-Baden-Geroldsau 10.07 0
## 440 Treuen 8.02 0
## 441 Lahr 10.76 0
## 442 Hersfeld, Bad 8.91 0
## 443 Rheinstetten 10.61 0
## 444 Wiesbaden-Auringen 9.32 0
## 445 Simbach/Inn 9.19 0
## 446 Ingelfingen-Stachenhausen 9.21 0
## 447 Laage (Flugplatz) 9.31 0
## 448 Maisach-Galgen 8.62 0
## 449 Bamberg 9.04 0
## 450 Grambek 9.35 0
## 451 Harburg 8.75 0
## 452 Fehmarn 9.96 0
## 453 Manschnow 9.75 0
## 454 Leuchtturm Alte Weser 10.06 0
## 455 Bassum 9.71 0
## 456 Oy-Mittelberg-Petersthal 7.16 1
## 457 Bevern, Kr. Holzminden 9.79 0
## 458 Schmieritz-Weltwitz 8.78 0
## 459 Wolfsburg (Südwest) 9.90 0
## 460 Andernach 10.52 0
## 461 Eslohe 8.14 0
## 462 Pirmasens 9.68 0
## 463 Köthen (Anhalt) 10.03 0
## 464 Aue 8.50 0
## 465 Waltrop-Abdinghof 10.33 0
## 466 Kaufbeuren-Oberbeuren 7.80 1
## 467 Berka, Bad (Flugplatz) 8.30 0
## 468 Groß Berßen 10.05 0
## 469 Günzburg 8.53 0
## 470 Neuruppin-Alt Ruppin 9.47 0
## 471 Staffelstein, Bad-Stublang 8.70 0
## 472 Dachsberg-Wolpadingen 7.76 1
Let’s look at the Measures of Central Tendency and Variability from the lecture (starting at slide 17).
Consider the following vector:
example_vec <- c(1, 2, 3, 4, 5)
How could we calculate the mean of example_vec?
We could simply calculate it “by hand”:
(1 + 2 + 3 + 4 + 5) / 5
## [1] 3
But this is not very useful if we look at an actual vector in our data frame, e.g., mean temperature:
weather_data$mean_temp
## [1] 9.48 9.35 9.29 8.13 10.54 3.61 10.31 8.98 9.04 11.17 9.63 9.83
## [13] 7.52 9.27 8.49 8.98 9.46 9.54 8.41 10.01 8.95 9.88 8.89 9.43
## [25] 8.94 9.81 9.92 8.75 7.13 8.87 9.77 9.53 9.59 10.22 9.67 9.41
## [37] 10.16 10.29 6.65 7.47 9.44 10.13 8.23 8.51 9.69 10.45 8.37 6.50
## [49] 9.45 9.73 9.66 9.52 10.67 7.33 9.33 5.29 10.02 5.38 8.26 10.70
## [61] 9.17 8.75 7.00 9.12 9.79 7.21 8.53 8.82 9.03 7.41 9.77 9.54
## [73] 8.29 9.85 8.51 9.88 8.66 8.61 8.39 7.92 10.21 9.66 9.80 9.95
## [85] 10.15 10.23 8.61 9.43 10.24 9.95 10.40 9.42 8.28 7.52 9.23 8.26
## [97] 8.42 9.76 10.11 9.13 9.71 9.53 9.59 10.00 9.16 5.85 9.95 10.75
## [109] 6.63 10.58 9.27 9.70 9.56 10.49 5.75 9.31 9.07 9.76 8.56 9.71
## [121] 9.92 7.74 9.28 9.69 8.34 9.74 7.11 8.18 8.84 7.94 9.64 10.11
## [133] 10.78 9.43 7.91 8.91 10.98 9.33 7.47 9.31 10.35 8.95 8.93 7.54
## [145] 9.68 8.36 9.06 9.57 8.85 9.48 9.62 9.22 9.90 9.42 7.92 10.31
## [157] 6.77 7.28 9.63 9.43 8.65 9.43 8.69 10.52 8.49 10.15 9.69 8.77
## [169] 9.74 9.66 8.57 8.81 9.18 8.16 10.31 7.02 9.27 9.28 8.89 8.93
## [181] 9.13 7.56 9.30 9.04 8.80 9.92 8.24 7.94 9.81 8.68 8.66 10.37
## [193] 9.05 9.78 9.69 8.56 8.46 8.79 9.05 7.60 8.40 8.41 9.54 9.61
## [205] 8.75 8.85 9.24 8.12 9.36 9.43 7.77 9.38 9.34 9.73 9.17 6.18
## [217] 9.44 8.78 7.20 8.39 9.78 9.77 10.19 9.92 9.47 8.95 10.14 8.90
## [229] 9.74 7.79 8.69 9.30 8.30 9.48 7.76 8.62 10.52 9.65 9.23 8.77
## [241] 10.06 9.34 8.72 8.60 10.85 10.66 10.59 9.05 9.61 8.06 7.25 8.02
## [253] 3.78 10.64 9.04 8.98 9.68 9.81 8.45 9.53 9.92 8.45 9.18 9.77
## [265] 9.70 9.81 10.12 7.78 9.90 8.50 9.48 9.14 10.40 9.84 10.16 9.07
## [277] 10.85 8.63 -4.05 8.87 10.20 9.15 9.13 8.87 7.90 9.57 7.11 8.62
## [289] 10.00 9.64 8.95 9.58 9.81 8.58 9.98 8.82 9.59 9.09 10.02 9.33
## [301] 10.24 10.58 10.32 9.40 8.22 10.25 6.11 7.44 6.39 6.91 8.16 9.56
## [313] 8.97 9.35 9.49 9.38 8.36 10.33 7.57 9.57 9.48 8.15 10.16 8.52
## [325] 8.18 5.74 10.84 10.42 9.97 10.22 9.84 9.66 10.87 10.99 9.69 8.56
## [337] 8.29 8.23 9.42 7.15 8.72 9.16 8.69 6.83 9.68 8.77 9.22 8.94
## [349] 8.16 9.10 10.01 9.39 9.17 9.53 10.55 11.37 9.92 8.26 9.50 8.62
## [361] 8.63 6.49 9.89 7.87 8.62 8.90 9.56 9.66 8.68 8.90 10.06 9.50
## [373] 9.83 9.52 6.90 9.35 5.09 9.90 9.83 8.91 10.22 8.67 9.80 10.12
## [385] 8.65 9.54 8.98 8.75 9.30 3.81 8.71 9.13 8.26 10.88 8.47 9.13
## [397] 9.61 8.49 9.36 9.71 8.45 10.23 6.90 9.29 7.41 9.57 8.02 10.48
## [409] 9.18 8.68 9.88 9.50 9.87 9.93 7.37 7.96 9.16 9.66 9.25 8.57
## [421] 9.98 9.34 7.59 8.37 9.23 9.84 7.94 9.44 8.87 9.42 10.13 8.70
## [433] 9.44 9.40 8.26 8.81 8.91 9.76 10.07 8.02 10.76 8.91 10.61 9.32
## [445] 9.19 9.21 9.31 8.62 9.04 9.35 8.75 9.96 9.75 10.06 9.71 7.16
## [457] 9.79 8.78 9.90 10.52 8.14 9.68 10.03 8.50 10.33 7.80 8.30 10.05
## [469] 8.53 9.47 8.70 7.76
Typing up all the entries individually would take a lot of time. We could use two functions that we already have seen, sum and length.
sum(weather_data$mean_temp) / length(weather_data$mean_temp)
## [1] 9.037903
Fortunately, R provides a much easier way to calculate a mean:
mean(weather_data$mean_temp) # That was easy.
## [1] 9.037903
But be sure that your vector is numeric. Could you calculate the mean of city?
weather_data$city
## [1] "Wacken" "Hasenkrug-Hardebek"
## [3] "Muskau, Bad" "Geisingen"
## [5] "Frankfurt/Main" "Großer Arber"
## [7] "Öhringen" "Schotten"
## [9] "Rothenburg ob der Tauber" "Waghäusel-Kirrlach"
## [11] "Boltenhagen" "Würzburg"
## [13] "Altenstadt" "Grambow-Schwennenz"
## [15] "Neuhütten/Spessart" "Falkenberg,Kr.Rottal-Inn"
## [17] "Lübeck-Blankensee" "Holzdorf (Flugplatz)"
## [19] "Königshofen, Bad" "Wusterwitz"
## [21] "Reimlingen" "Diepholz"
## [23] "Gera-Leumnitz" "Seesen"
## [25] "Mühlhausen/Thüringen-Görmar" "Hameln-Hastenbeck"
## [27] "Kaiserslautern" "Lennestadt-Theten"
## [29] "Marienberg" "Lügde-Paenbruch"
## [31] "Wittenberg" "Dresden-Klotzsche"
## [33] "Buchenbach" "Hamburg-Neuwiedenthal"
## [35] "Mergentheim, Bad-Neunkirchen" "Cölbe, Kr. Marburg-Biedenkopf"
## [37] "Ennigerloh-Ostenfelde" "Sachsenheim"
## [39] "Hoherodskopf/Vogelsberg" "Tirschenreuth-Lodermühl"
## [41] "Notzingen" "Rostock-Warnemünde"
## [43] "Siegsdorf-Höll" "Lautertal-Oberlauter"
## [45] "Hohwacht" "Berlin-Tempelhof"
## [47] "Holzkirchen" "Mittenwald-Buckelwiesen"
## [49] "Dörnick" "Lauchstädt, Bad"
## [51] "Lüchow" "Wendisch Evern"
## [53] "Geldern-Walbeck" "Schneifelforsthaus"
## [55] "Osterfeld" "Carlsfeld"
## [57] "Aachen-Orsbach" "Zinnwald-Georgenfeld"
## [59] "Kaisersbach-Cronhütte" "Geisenheim"
## [61] "Weingarten, Kr. Ravensburg" "Twistetal-Mühlhausen"
## [63] "Schönwald/Ofr.-Brunn" "Wutöschingen-Ofteringen"
## [65] "Greifswalder Oie" "Reit im Winkl"
## [67] "Amerang-Pfaffing" "Feldberg/Mecklenburg"
## [69] "Landshut-Reithof" "Garmisch-Partenkirchen"
## [71] "Sankt Peter-Ording" "Langenlipsdorf"
## [73] "Kronach" "Bremen"
## [75] "Kiefersfelden-Gach" "Friesoythe-Altenoythe"
## [77] "Bertsdorf-Hörnitz" "Leinefelde"
## [79] "Langenwetzendorf-Göttendorf" "Marienberg, Bad"
## [81] "Lippstadt-Bökenförde" "Gardelegen"
## [83] "Bielefeld-Deppendorf" "Demker"
## [85] "Dresden-Hosterwitz" "Emmendingen-Mundingen"
## [87] "Burgwald-Bottendorf" "Mühlacker"
## [89] "Nauheim, Bad" "Leipzig-Holzhausen"
## [91] "Stuttgart (Schnarrenberg)" "Anklam"
## [93] "Weiden" "Leutkirch-Herlazhofen"
## [95] "Tribsees" "Feuchtwangen-Heilbronn"
## [97] "Ebrach" "Kiel-Holtenau"
## [99] "Berlin Brandenburg" "Wittstock-Rote Mühle"
## [101] "München-Stadt" "Bremervörde"
## [103] "Artern" "Nienburg"
## [105] "Starkenberg-Tegkwitz" "Kahler Asten"
## [107] "Trier-Petrisberg" "Tönisvorst"
## [109] "Wernigerode-Schierke" "Saarbrücken-Burbach"
## [111] "Nossen" "Konstanz"
## [113] "Gründau-Breitenborn" "Rheinfelden"
## [115] "Wasserkuppe" "Sigmarszell-Zeisertsweiler"
## [117] "Großerlach-Mannenweiler" "Kirchdorf/Poel"
## [119] "Augsburg" "Leipzig/Halle"
## [121] "Berlin-Dahlem (FU)" "Grainet-Rehberg"
## [123] "Wittenborn" "Lübben-Blumenfelde"
## [125] "Lechfeld" "Alfhausen"
## [127] "Elster, Bad-Sohl" "Ostheim v.d. Rhön"
## [129] "Herzberg" "Meiningen"
## [131] "Wittmundhafen" "Alzey"
## [133] "Düsseldorf" "Nideggen-Schmidt"
## [135] "Harzgerode" "Geringswalde-Altgeringswalde"
## [137] "Duisburg-Baerl" "Garsebach bei Meißen"
## [139] "Freudenstadt" "Eschwege"
## [141] "Obersulm-Willsbach" "Schlüchtern-Herolz"
## [143] "Roth" "Villingen-Schwenningen"
## [145] "Boizenburg" "Wielenbach (Demollstr.)"
## [147] "Aldersbach-Kriestorf" "Waren (Müritz)"
## [149] "Gottfrieding" "Möhrendorf-Kleinseebach"
## [151] "List auf Sylt" "Grünow"
## [153] "Kitzingen" "Coschen"
## [155] "Neustadt am Kulm-Filchendorf" "Cuxhaven"
## [157] "Birx/Rhön" "Oberharz am Brocken-Stiege"
## [159] "Rotenburg (Wümme)" "Rosengarten-Klecken"
## [161] "Moorgrund Gräfen-Nitzendorf" "Wittingen-Vorhop"
## [163] "Idar-Oberstein" "Köln-Bonn"
## [165] "Hahn" "Wuppertal-Buchenhofen"
## [167] "Wernigerode" "Buchen, Kr. Neckar-Odenwald"
## [169] "Hamburg-Fuhlsbüttel" "Hoyerswerda"
## [171] "Laupheim" "Löhnberg-Obershausen"
## [173] "Reichshof-Eckenhagen" "Rottweil"
## [175] "Offenbach-Wetterpark" "Teuschnitz"
## [177] "Singen" "Putbus"
## [179] "Elsendorf-Horneck" "Chieming"
## [181] "Hohn" "Hohenpeißenberg"
## [183] "Ostenfeld (Rendsburg)" "Menz"
## [185] "Metten" "Belm"
## [187] "Pommelsbrunn-Mittelburg" "Schleiz"
## [189] "Emden" "Blankenrath"
## [191] "Lichtenhain-Mittelndorf" "Weilerswist-Lommersum"
## [193] "Ruppertsecken" "Aschersleben-Mehringen"
## [195] "Kubschütz, Kr. Bautzen" "Meinerzhagen-Redlendorf"
## [197] "Plauen" "Hechingen"
## [199] "Fritzlar (Flugplatz)" "Pfullendorf"
## [201] "Tann/Rhön" "Sandberg"
## [203] "Wolfach" "Greifswald"
## [205] "Fürstenzell" "Amberg-Unterammersricht"
## [207] "Karlshagen" "Oberhaching-Laufzorn"
## [209] "Quickborn" "Kyritz"
## [211] "Kohlgrub, Bad (Rosshof)" "Heckelberg"
## [213] "Nürnberg" "Freiburg/Elbe"
## [215] "Görlitz" "Kleiner Inselsberg"
## [217] "Steinau, Kr. Cuxhaven" "Eisenach"
## [219] "Berleburg, Bad-Stünzel" "Manderscheid-Sonnenhof"
## [221] "Lippspringe, Bad" "Drewitz bei Burg"
## [223] "Lüdinghausen-Brochtrup" "Huy-Pabstorf"
## [225] "Elzach-Fisnacht" "Oberzent-Beerfelden"
## [227] "Metzingen" "Michelstadt-Vielbrunn"
## [229] "Klitzschen bei Torgau" "Stötten"
## [231] "Schauenburg-Elgershausen" "Gollhofen"
## [233] "Hermaringen-Allewind" "Teterow"
## [235] "Prackenbach-Neuhäusl" "Ebersberg-Halbing"
## [237] "Kahl/Main" "Berus"
## [239] "Wiesenburg" "Mühldorf"
## [241] "Braunschweig" "Padenstedt (Pony-Park)"
## [243] "Moringen-Lutterbeck" "Schwarzburg"
## [245] "Dürkheim, Bad" "Heinsberg-Schleiden"
## [247] "Trier-Zewen" "Querfurt-Mühle Lodersleben"
## [249] "Weiskirchen/Saar" "Sohland/Spree"
## [251] "Klippeneck" "Gilserberg-Moischeid"
## [253] "Brocken" "Rheinau-Memprechtshofen"
## [255] "Waldems-Reinborn" "Schwandorf"
## [257] "Friedrichshafen-Unterraderach" "Hiddensee-Vitte"
## [259] "Attenkam" "Erfde"
## [261] "Berlin-Buch" "Gräfenberg-Kasberg"
## [263] "Wagersrott" "Lindenberg"
## [265] "Saarbrücken-Ensheim" "Großenkneten"
## [267] "Dresden-Strehlen" "Oberviechtach"
## [269] "Hannover" "Martinroda"
## [271] "Uelzen" "Krölpa-Rockendorf"
## [273] "Borken in Westfalen" "Neunkirchen-Seelscheid-Krawinkel"
## [275] "Ahaus" "Naumburg/Saale-Kreipitzsch"
## [277] "Freiburg" "Dillenburg"
## [279] "Zugspitze" "Niederstetten"
## [281] "Schaafheim-Schlierbach" "Rosenheim"
## [283] "Barth" "Weißenburg-Emetzheim"
## [285] "Ulm-Mähringen" "Ueckermünde"
## [287] "Wunsiedel-Schönbrunn" "München-Flughafen"
## [289] "Leuchtturm Kiel" "Doberlug-Kirchhain"
## [291] "Kissingen, Bad" "Neunkirchen-Wellesweiler"
## [293] "Berge" "Neuburg/Kammel-Langenhaslach"
## [295] "Celle" "Erfurt-Weimar"
## [297] "Nordholz (Flugplatz)" "Weimar-Schöndorf"
## [299] "Borkum-Flugplatz" "Hümmerich"
## [301] "Kleve" "Worms"
## [303] "Werl" "Ummendorf"
## [305] "Dippoldiswalde-Reinberg" "Müllheim"
## [307] "Deutschneudorf-Brüderwiese" "Merklingen"
## [309] "Kleiner Feldberg/Taunus" "Lenzkirch-Ruhbühl"
## [311] "Altheim, Kreis Biberach" "Runkel-Ennerich"
## [313] "Straubing" "Schwäbisch Gmünd-Weiler"
## [315] "Marnitz" "Schonungen-Mainberg"
## [317] "Heinersreuth-Vollhof" "Berlin-Marzahn"
## [319] "Hof" "Schönhagen (Ostseebad)"
## [321] "Müncheberg" "Memmingen"
## [323] "Barsinghausen-Hohenbostel" "Balingen-Bronnhaupten"
## [325] "Parsberg/Oberpfalz-Eglwang" "Neuhaus am Rennweg"
## [327] "Bergzabern, Bad" "Essen-Bredeney"
## [329] "Rahden-Kleinendorf" "Magdeburg"
## [331] "Gevelsberg-Oberbröking" "Oschatz"
## [333] "Ohlsbach" "Frankfurt/Main-Westend"
## [335] "Pforzheim-Ispringen" "Sontra"
## [337] "Saldenburg-Entschenreuth" "Neubulach-Oberhaugstett"
## [339] "Goldberg" "Zwiesel"
## [341] "Altomünster-Maisbrunn" "Steinhagen-Negast"
## [343] "Piding" "Fichtelberg/Oberfranken-Hüttstadl"
## [345] "Pelzerhaken" "Fulda-Horas"
## [347] "Schipkau-Klettwitz" "Kösching"
## [349] "Nürnberg-Netzstall" "Faßberg"
## [351] "Jena (Sternwarte)" "Soltau"
## [353] "Olbersleben" "Helmstedt-Emmerstedt"
## [355] "Mainz-Lerchenberg (ZDF)" "Köln-Stammheim"
## [357] "Bernburg/Saale (Nord)" "Brilon-Thülen"
## [359] "Zeitz" "Ellwangen-Rindelbach"
## [361] "Alsfeld-Eifa" "Meßstetten-Appental"
## [363] "Michelstadt" "Sigmaringen-Laiz"
## [365] "Neu-Ulrichstein" "Trostberg"
## [367] "Freudenberg/Main-Boxtal" "Schwerin"
## [369] "Donauwörth-Osterweiler" "Dillingen/Donau-Fristingen"
## [371] "Itzehoe" "Angermünde"
## [373] "Potsdam" "Röllbach"
## [375] "Braunlage" "Hattstedt"
## [377] "Schmücke" "Wangerland-Hooksiel"
## [379] "Harzburg, Bad" "Deuselbach"
## [381] "Münster/Osnabrück" "Simmern-Wahlbach"
## [383] "Salzuflen, Bad" "Quedlinburg"
## [385] "Chemnitz" "Tholey"
## [387] "Lichtentanne" "Kirchberg/Jagst-Herboldshausen"
## [389] "Gießen/Wettenberg" "Fichtelberg"
## [391] "Renningen-Ihinger Hof" "Wesertal-Lippoldsberg"
## [393] "Schorndorf-Knöbling" "Mannheim"
## [395] "Weidenbach-Weiherschneidbach" "Arnstein-Müdesheim"
## [397] "Elpersbüttel" "Gelbelsee"
## [399] "Schleswig" "Lenzen/Elbe"
## [401] "Kümmersbruck" "Bremerhaven"
## [403] "Oberstdorf" "Göttingen"
## [405] "Lobenstein, Bad" "Worpswede-Hüttenbusch"
## [407] "Veilsdorf" "Königswinter-Heiderhof"
## [409] "Montabaur" "Lüdenscheid"
## [411] "Genthin" "Arnsberg-Neheim"
## [413] "Jeßnitz" "Darmstadt"
## [415] "Münsingen-Apfelstetten" "Neukirchen-Hauptschwenda"
## [417] "Regensburg" "Stuttgart-Echterdingen"
## [419] "Arkona" "Weihenstephan-Dürnast"
## [421] "Waibstadt" "Trollenhagen"
## [423] "Waldmünchen" "Eichstätt-Landershofen"
## [425] "Leck" "Cottbus"
## [427] "Kempten" "Zehdenick"
## [429] "Neuburg/Donau (Flugplatz)" "Dachwig"
## [431] "Norderney" "Waltershausen"
## [433] "Alfeld" "Hilgenroth"
## [435] "Kall-Sistig" "Weinbiet"
## [437] "Warburg" "Seehausen"
## [439] "Baden-Baden-Geroldsau" "Treuen"
## [441] "Lahr" "Hersfeld, Bad"
## [443] "Rheinstetten" "Wiesbaden-Auringen"
## [445] "Simbach/Inn" "Ingelfingen-Stachenhausen"
## [447] "Laage (Flugplatz)" "Maisach-Galgen"
## [449] "Bamberg" "Grambek"
## [451] "Harburg" "Fehmarn"
## [453] "Manschnow" "Leuchtturm Alte Weser"
## [455] "Bassum" "Oy-Mittelberg-Petersthal"
## [457] "Bevern, Kr. Holzminden" "Schmieritz-Weltwitz"
## [459] "Wolfsburg (Südwest)" "Andernach"
## [461] "Eslohe" "Pirmasens"
## [463] "Köthen (Anhalt)" "Aue"
## [465] "Waltrop-Abdinghof" "Kaufbeuren-Oberbeuren"
## [467] "Berka, Bad (Flugplatz)" "Groß Berßen"
## [469] "Günzburg" "Neuruppin-Alt Ruppin"
## [471] "Staffelstein, Bad-Stublang" "Dachsberg-Wolpadingen"
Let’s try to calculate the mean.
mean(weather_data$city)
## Warning in mean.default(weather_data$city): argument is not numeric or logical:
## returning NA
## [1] NA
It does not work! And even by hand we could not calculate the mean of character valued vectors.
Here is an overview over functions for measures of centrality and variability:
mean()median()var()sd()range()IQR()You can try them out here:
# Median
median(weather_data$mean_temp)
## [1] 9.28
# Variance
var(weather_data$mean_temp)
## [1] 1.566767
# Standard deviation
sd(weather_data$mean_temp)
## [1] 1.251706
# Range
range(weather_data$mean_temp)
## [1] -4.05 11.37
# Inter Quartile Range (IQR)
IQR(weather_data$mean_temp)
## [1] 1.21
Unfortunately, there is no direct function to get the mode. The solutions you will find online are all a bit advanced. So the easiest solution is to look for the mode using a frequency table.
table(weather_data$cold)
##
## 0 1
## 409 63
The table() function shows you how often each value is
in the vector. You can now identify the most frequent value.
Now we will work with the weather_data data set. It is
already loaded for you and you can use it right away.
Show the variable mean_temp if it is over
10.
Generate a new variable and call it hot that is zero
for mean temperature < 10 and 1 for mean
temperature > 10 degree Celsius.
Have a look at your data set.
Please solve all three steps in the next code chunk.
This is a little trickier: Can you find the hottest and coldest city in Germany 2021?
Hint: The functions min() and max() help
you to find the minimum and maximum values of a vector or variable.
Combine that with your newly learned subsetting skills and you’ll find
the answer.
We will continue working with the weather data set
Calculate the mean value of latitude and save the result as
mean_latitude.
Calculate the variance of latitude and save the result as
var_latitude.
Calculate the standard deviation of latitude and save the result
as sd_latitude.
Let’s have a short look at our data again. Remember:
head() shows you the first six entries of your data. It is
very useful to get a look at the data structure when you have a lot of
rows in your dataset.
head(weather_data)
## city longitude latitude mean_temp cold
## 1 Wacken 9.387966 54.02460 9.48 0
## 2 Hasenkrug-Hardebek 9.855267 54.00377 9.35 0
## 3 Muskau, Bad 14.700810 51.56598 9.29 0
## 4 Geisingen 8.647358 47.92417 8.13 0
## 5 Frankfurt/Main 8.521294 50.02591 10.54 0
## 6 Großer Arber 13.133791 49.11289 3.61 1
Now we can create a simple scatterplot:
plot(
x = weather_data$longitude,
y = weather_data$mean_temp
)
To get a nicer plot, we can adjust many things. We suggest to always explicitly make those adjustments in the same order.
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (West - East)", # This labels the x-axis.
ylab = "Mean Temperature in 2021", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = "black", # What color should the symbols have?
frame = F # No box around the plot.
)
We can also adjust the colors. Let’s highlight Mannheim!
Pro Tip: To color up your data visualizations, use the viridis-package.
Viridis colors make it easier to read by those with colorblindness and print well in greyscale. You probably don’t want to have plots like this:
We first need a vector that gives us the right colors with respect to the city variable.
library(viridis)
## Loading required package: viridisLite
# we need two colors, this is how we get them:
two_colors <- viridis(2)
two_colors # these are so-called HEX color codes
## [1] "#440154FF" "#FDE725FF"
# we use the first color for males and the second for females
mannheim_color <- ifelse(weather_data$city == "Mannheim", two_colors[1], two_colors[2])
# let's have a look:
table(mannheim_color)
## mannheim_color
## #440154FF #FDE725FF
## 1 471
Now we can use this vector to specify the color respective to Mannheim:
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (West - East)", # This labels the x-axis.
ylab = "Mean Temperature in 2021", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = mannheim_color, # Instead of just black we now use the color vector.
frame = F # No frame around the plot.
)
Now that we use different colors, we also need a legend to know which color is which.
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (West - East)", # This labels the x-axis.
ylab = "Mean Temperature in 2021", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = mannheim_color, # Instead of just black we now use the color vector.
frame = F # No frame around the plot.
)
legend(
"bottomleft", # Locate the legend in the topleft corner.
legend = c("Mannheim", "other"), # Give it labels.
pch = 19, # Specify symbols as in the scatterplot.
col = two_colors, # Specify colors.
bty = "n" # No box around the legend.
)
plot(
x = weather_data$longitude,
y = weather_data$mean_temp,
type = "p", # This explicitly says that we want points. You could also try "l".
main = "Mean temperatures of German cities", # This adds a title to the plot
xlab = "Longitude (West - East)", # This labels the x-axis.
ylab = "Mean Temperature in 2021", # What does this do then?
las = 1, # This affects the tick labels of the y-axis.
pch = 19, # Here we choose what symbols we want to plot.
col = mannheim_color, # Instead of just black we now use the color vector.
frame = F # No frame around the plot.
)
# we want to label the point that refers to Mannheim
# We can do that with the text() function,
# But we need to subset the data, so that only Mannheim gets labelled,
# and no other city
text(
x = weather_data$longitude[weather_data$city == "Mannheim"], # subset Mannheim
y = weather_data$mean_temp[weather_data$city == "Mannheim"], # subset Mannheim
labels = "Mannheim", # label Mannheim as "Mannheim"
pos = 4 # position the label right to the point
)
Now we want to visualize mean temperature with a histogram. This is how you get a standard histogram:
hist(x = weather_data$mean_temp) # That's intuitive, but does not look too great
Again, we can adjust many things to make it nicer.
hist(
x = weather_data$mean_temp, # For a histogram we only specify x.
breaks = 50, # specify the number of bins
main = "A Histogram",
xlab = "Mean temperature in degree Celsius",
ylab = "Number of observations",
las = 1, # shift the y-axis labels
col = viridis(1), # One color only (first color from viridis)
border = "white" # That's the color of the bin borders.
)
We can also create density plots.
plot(
density(weather_data$mean_temp), # density() takes care of x, y and type.
main = "A Simple Density Plot",
xlab = "Mean temperature in degree Celsius",
ylab = "", # The y-axis is not really meaningful here.
col = viridis(1),
lwd = 2, # Control the width of the line
frame = F,
yaxt = "n" # Remove the y-axis.
)
And we can also fill the are underneath the curve:
plot(
density(weather_data$mean_temp), # density() takes care of x, y and type.
main = "A Simple Density Plot",
xlab = "Mean temperature in degree Celsius",
ylab = "", # The y-axis is not really meaningful here.
col = viridis(1),
lwd = 2, # Control the width of the line
frame = F,
yaxt = "n" # Remove the y-axis.
)
polygon(density(weather_data$mean_temp),
col = viridis(1, alpha = 0.5) # same color but 50% transparent
)
boxplot(
x = weather_data$mean_temp, # As for histograms we only specify x.
main = "Boxplot of Mean temperature in degree Celsius",
ylab = "Mean temperature in degree Celsius",
las = 1,
col = plasma(1),
frame = F
)
Or a horizontal boxplot.
boxplot(
x = weather_data$mean_temp,
horizontal = T, # With horizontal = T we rotate the boxplot.
main = "Horizontal Boxplot of Mean temperature in degree Celsius",
xlab = "Mean temperature in degree Celsius",
las = 1,
frame = F
)
You learned in the lecture that boxplots have some disadvantages.
Violin plots are a very nice alternative!
This is how you get them:
library(vioplot)
## Loading required package: sm
## Package 'sm', version 2.2-5.7: type help(sm) for summary information
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
vioplot(
x = weather_data$mean_temp,
horizontal = T, # With horizontal = T we rotate the boxplot.
main = "Horizontal Violinplot of Mean temperature in degree Celsius",
xaxt = "n",
xlab = "Mean temperature in degree Celsius",
bty = "n",
axes = FALSE,
names = "",
border = NA
)
Okay, last round of exercises for today:
Make a histogram of the latitude variable.
Make the plot nice looking (Name the axes, main title, colors…)
What we learned in this session:
The first lab session and this script should equip you with all the tools (and lines of code) to tackle the first homework assignment.
Copy the lines of code that worked for something similar. Then, adjust the code according to your problem.
Substantially, in your homework you will inspect a data set on US presidential elections. You will calculate some measures of central tendency and variability. Finally, you will produce some nice plots.
It is best to get started with your homework as soon as possible (after it was handed out on Tuesday).
Try to write the R Code first. We will provide you a
.Rmd template to hand in your results.
In order to pass the homework assignment you need to tackle ALL problems of a problem set. For a pass you also need to get most of the problems right (or at least show us that you tried everything to get it right.)
If you have any questions concerning the lecture or the tutorial please post them to the ILIAS forum or on Slack. We will answer them on a regular basis.
Do not hesitate to come to the office hours!
And always remember if you have a question, it is never a stupid question. In fact most of your fellow students probably have the same or a similar question. By asking it, everyone in this class will profit.